Genome comparisons reveal accessory genes crucial for the evolution of apple Glomerella leaf spot pathogenicity in Colletotrichum fungi

Abstract Apple Glomerella leaf spot (GLS) is an emerging fungal disease caused by Colletotrichum fructicola and other Colletotrichum species. These species are polyphyletic and it is currently unknown how these pathogens convergently evolved to infect apple. We generated chromosome‐level genome assemblies of a GLS‐adapted isolate and a non‐adapted isolate in C. fructicola using long‐read sequencing. Additionally, we resequenced 17 C. fructicola and C. aenigma isolates varying in GLS pathogenicity using short‐read sequencing. Genome comparisons revealed a conserved bipartite genome architecture involving minichromosomes (accessory chromosomes) shared by C. fructicola and other closely related species within the C. gloeosporioides species complex. Moreover, two repeat‐rich genomic regions (1.61 Mb in total) were specifically conserved among GLS‐pathogenic isolates in C. fructicola and C. aenigma. Single‐gene deletion of 10 accessory genes within the GLS‐specific regions of C. fructicola identified three that were essential for GLS pathogenicity. These genes encoded a putative non‐ribosomal peptide synthetase, a flavin‐binding monooxygenase and a small protein with unknown function. These results highlight the crucial role accessory genes play in the evolution of Colletotrichum pathogenicity and imply the significance of an unidentified secondary metabolite in GLS pathogenesis.


| INTRODUC TI ON
The genus Colletotrichum (Glomerellales, Ascomycota) is one of the most important groups of fungal plant pathogens (Dean et al., 2012).It contains over 200 species belonging to at least 16 species complexes (Bhunjun et al., 2021;Liu et al., 2022).The genus collectively infects nearly 3000 plant species and causes significant harm to many valuable fruits, vegetables, ornamentals and cereals (Cannon et al., 2012;O'Connell et al., 2012).On apple (Malus domestica), Colletotrichum species cause two significant diseases, apple bitter rot (ABR) and Glomerella leaf spot (GLS) (Chen et al., 2022;Rockenbach et al., 2016;Velho et al., 2015).ABR is a long-standing disease worldwide that has been documented since the 1800s (Alwood, 1890;Brook, 1977).It causes extensive watery rot on near-mature or in-storage fruits.GLS is a foliar disease that damages young leaves by producing irregular necrotic lesions and defoliation.GLS also damages apple fruits by producing small sunken lesions that, however, do not progress into water-soaking rot.GLS was initially described in the 1970s (Taylor, 1971) and is currently only found in the United States, Brazil, China and Uruguay (Alaniz et al., 2019;Chen et al., 2022;Gonzalez et al., 2006;Hamada et al., 2019).In the field, all apple cultivars are susceptible to ABR, but only those in the Golden Delicious group, specifically Gala, are susceptible to GLS (Liu et al., 2016;Rockenbach et al., 2016).
The differences in symptoms, history of occurrence and host cultivar and tissue specificity between GLS and ABR diseases suggest distinct pathogenic mechanisms.Pathogens that cause GLS and ABR are both taxonomically diverse and polyphyletic (Chen et al., 2022;Liang et al., 2022).There have been reports of nine distinct species belonging to Colletotrichum gloeosporioides species complex (CGSC), Colletotrichum acutatum species complex (CASC) and Colletotrichum boninense species complex (CBSC) capable of causing GLS.
Meanwhile, more than 10 ABR species belonging to the CGSC and the CASC have been reported.Interestingly, certain Colletotrichum species (e.g., C. fructicola) contain specialized isolates capable of causing either GLS or ABR (Rockenbach et al., 2016).Such intraspecific pathogenic differentiation and polyphyletic distribution of GLS pathogen species indicate multiple origins of GLS pathogenicity, which may have their origins in horizontal transfer of a pathogenicity determinant(s).While comparisons between GLS and ABR isolates have been made in terms of hydrolytic enzymatic activities (Velho et al., 2018) and global transcriptomes (Jiang et al., 2022), genetic processes underpinning the evolutionary origin of GLS pathogenicity remain mostly unclear.
Filamentous plant pathogens, including phylogenetically unrelated fungi and oomycetes, interact antagonistically with their plant hosts, creating a co-evolutionary arms race between pathogenicity and defence.For these pathogens, genome compartmentalization is a key host adaptation and virulence-related genes are more likely to be found in rapidly evolving genomic regions such as gene-sparse and repeat-rich regions, AT-rich isochores or accessory chromosomes (Dong et al., 2015;Frantzeskakis et al., 2019).Accessory genes located within these regions promote pathogen infection by acting as phytoalexin-detoxifying enzymes, plant defence suppressors, host-selective toxins and other defined and undefined functions (Bertazzoni et al., 2018).Over evolutionary time, accessory genes also promote the expansion of pathogenicity towards new plant varieties or species, posing a continual threat to crop production (Bertazzoni et al., 2018).
In order to analyse genetic factor(s) associated with GLS pathogenicity, we generated chromosome-level genome assemblies of two C. fructicola isolates, 1104-7 and LJ19, derived from apple GLS lesion and bell pepper (Capsicum annuum) fruit, respectively.Artificial inoculation showed that 1104-7 is GLS pathogenic, whereas LJ19 is GLS nonpathogenic.We performed genome comparison between the two isolates and among closely related species belonging to CGSC.We found that C. fructicola and other CGSC species share a conserved bipartite genome architecture pattern, with two short minichromosomes exhibiting high inter-and intraspecific variation.Furthermore, using 17 additional C. fructicola and C. aenigma isolates derived from different plant hosts and being either pathogenic or nonpathogenic for GLS, we performed genome resequencing analysis and identified two genomic regions that are GLS specific.
Through deletion analysis with 10 accessory genes within these two GLS-specific regions in C. fructicola, we identified three that are important for GLS pathogenicity.The results of our study shed light on virulence mechanisms and evolution in Colletotrichum fungi and emphasize the significance of accessory genes in controlling apple GLS pathogenicity.

| Near-complete genome assemblies of two C. fructicola isolates
Nanopore sequencing generated reads with total sizes of 7.56 Gb (N50 length = 27.87 kb) and 27.46 Gb (N50 length = 28.97 kb) for the GLS-pathogenic isolate 1104-7 and GLS-nonpathogenic isolate LJ19, respectively.Using these reads, two genome assemblies were produced using NextDenovo, with total lengths of 58.55 and 55.69 Mb (Table S1).There were 12 scaffolds in each assembly.For 1104-7, five scaffolds had copies of a telomeric repeat (TTAGGG) on both ends and three scaffolds had telomeric repeats on one end.Aligned reads at six additional scaffold ends extended outwards to reach telomeric repeats, supporting that these ends are nearly complete (Table S2).For LJ19, five scaffolds had repeats on both ends and six scaffolds had repeats on one end, and three additional scaffold ends were close to telomeric repeats.Previously generated Hi-C reads for 1104-7 (Liang et al., 2020) were used to validate the accuracy of the 1104-7 assembly.In the Hi-C contact map, the strongest signal was diagonal and each contig contained a putative centromere, indicating a correspondence relationship between acquired contig and chromosome (Figure 1a).A high degree of chromosome collinearity was seen between 1104-7 and LJ19, with no interchromosomal translocation events detected (Figure 1b).In both assemblies, the 3′ end of scaffold S4 corresponded to ribosomal DNA repeats, whereas scaffolds S11 and S12 corresponded to putative minichromosomes (length <1 Mb).Using a rigorous criterion (DNA identity >99%, length >10 kb), 92.23% (52.29 Mb) of LJ19 genomic DNA could be aligned to the 1104-7 reference genome (Figure 1c), whereas 89.30% (52.84 Mb) of 1104-7 genomic DNA could be aligned to LJ19 using the same criterion (Figure 1d).
We previously reported a 1104-7 genome assembly (Liang et al., 2020) using the same set of nanopore reads data as here, but using a different assembling workflow (a combination of Canu and Flye), which had 16 scaffolds in total.The sizes of the two assembly versions were similar (58.55 Mb vs. 58.69Mb), and the variation in the number of scaffolds was caused by the combination of five scaffolds from the previous version (S4, S11, S14-S16) into one scaffold (S1) in the current version (Figure S1).Up to 99.54% and 99.25% of the sequences from the previous and current genomes, respectively, could be aligned using Mummer-based alignment with a strict cut-off (DNA identity >99%, length >10 kb).Given that the two assemblies were highly congruent and the new assembly was more complete, it was used for all subsequent analyses in this work.

| Repetitive elements and bipartite genome architecture of C. fructicola
Our repeat analysis process predicted 2588 repetitive elements (REs) totalling 3.32 Mb in length from the 1104-7 genome.Similarly, 2054 REs totalling 2.43 Mb in length were found in LJ19.At the order level (Wicker et al., 2007), total lengths of TIR DNA transposons, LINE retrotransposons and LTR retrotransposons occupied 38.17%, 32.44% and 29.39%, respectively, of the total predicted TE spaces in 1104-7 (Figure S2a).In LJ19, total lengths of TIRs, LINEs and LTRs occupied 45.51%, 35.39% and 19.10% of the total predicted spaces, respectively (Figure S2a).Compared to LINEs and LTRs, TIRs and unknown elements were generally more divergent from the corresponding consensus sequences (Figure S2b).
Along the scaffolds, REs were dispersed unevenly.First, scaffold ends and presumptive centromere regions were abundant in REs.In 1104-7, REs covered 60.45% of scaffold end spaces (50 kb distance range) and 82.88% of putative centromere spaces (border characterized by a rapid shift in GC content, 0.57 Mb total space).Second, short scaffolds (<1 Mb) that resembled minichromosomes contained considerably greater proportions of REs relative to long scaffolds (Figure S2c).The average RE coverage per cent values for short scaffolds (S11 and S12) in 1104-7 and LJ19 were over five times higher than the equivalent values for long scaffolds (S1-S10).In 1104-7, S11 and S12, which account for 2.64% of the whole genome, contained 12.95% of all annotated REs.Repeat-induced point mutation (RIP) is a crucial genome defence mechanism in fungi that renders repetitive DNA inactive during sexual reproduction.RIP entails C:G to T:A transition between sequence pairs that share more than 80% identity over a minimum length of 400 bp.C. fructicola sexual reproduction is common in nature and the 1104-7 strain itself produces fertile perithecia on potato dextrose agar (PDA) (Liang et al., 2021).The 1104-7 genome contains orthologues of RID (Cf1104nano2|16180) and Dim-2 (Cf1104nano2|13031) that function in RIP in other fungi.To ascertain whether RIP contributes towards TE silencing in C. fructicola, we calculated the RIP index (CpA+TpG)/(ApG+GpT) values for individual REs greater or equal to 400 bp and belonging to recognized RE families with more than 10 copies.A total of 20 RE families were examined.The average RIP index value for 18 families was less than the genome-wide average (1.22 ± 0.11, 10 kb window).
In addition, the average RIP index values for nine RE families were <0.8 (Figure S2d,e), a RIP-indicative criterion (Hane & Oliver, 2008).
These features imply that RIP is active in C. fructicola.The comparison revealed two distinct groups of chromosomes (scaffolds) (Figure 2a).Group I chromosomes (corresponding to S1-S10 in 1104-7) were long (>3 Mb) and with high levels of cross-species collinearity, which were designated as conserved core chromosomes.
Differing from group I members, chromosomes (scaffolds) in group II were shorter (0.13-0.92 Mb) and highly divergent among species (Figures 2b and S3), which were referred to as minichromosomelike scaffolds (MLSs).Compared with core chromosomes, MLSs had much lower levels of GC content and gene density, but higher levels of RE content and higher fractions of lineage-specific genes and unannotated genes (Figure S4).Such bipartite differentiation pattern was shared by all four CGSC species.In terms of gene functions, transporters and CAZYs were universally enriched for MLSs in all four species, whereas no uniform enrichment of small secreted proteins (SSPs), cytochrome P450s or transcription factors was observed (Figure S5).

| Nascent intraspecific chromosomal rearrangements in C. fructicola
Chromosomal rearrangement events can drive virulence evolution in plant-pathogenic fungi (de Jonge et al., 2013;Faino et al., 2016), the occurrence of which, however, has not been characterized in Colletotrichum.Here, we annotated eight isolate-specific rearrangement events (>10 kb, inversions and translocations) within core chromosomes of C. fructicola based on manual analysis of genome alignment of the four high-quality genomes (1104-7, LJ19, CF413, Nara_gc5).These events include five inversions and three translocations, which are summarized in Figure 3. Rearrangements of minichromosome were more complicated and were not examined in this study.Long-read mapping validated all synteny breakpoints (BPs) (Figures S7-S14).DNA sequences at each synteny BP were manually annotated to postulate the cause and functional impact of the corresponding DNA rearrangement event (Table S3, Figures S7-S14).DNA inversions tend to accompany inverted repeat elements, as is the case for four out of five chromosomal inversions (1, 3, 4 and 5).Moreover, two of three translocation events (all of which took place in Nara_gc5) involved the integration of lineage-specific DNAs at the BP locations.For translocation 2, the ancestral chromosome (represented by 1104-7 S5) was divided into two segments in Nara_gc5 (3.88 and 0.9 Mb), which merged with two lineage-specific DNAs (0.39 and 1.86 Mb, respectively).For translocation 3, 420 and 590 bp DNA sequences in 1104-7 were replaced by 0.52 and 0.46 Mb lineage-specific DNAs, respectively, in Nara_gc5.Among a total of 16 synteny BPs, four inversion BPs (inversion 2 and 4) and two translocation BPs (translocation 1) were intragenic, which may disrupt the functions of the corresponding genes (Table S3).S3.

F I G U R E 4
Secondary metabolite (SM) gene clusters of 1104-7 containing shared homology and synteny with fungal SM clusters producing known metabolites.Grey boxes connect homologous genes identified by BlastP queries.Protein sequence identity information is listed in Table S4.S4).Among them, cercosporin and betaenone C are phytotoxic, while alternapyrone, ilicicolin H, gliovirin and asperlin have antimicrobial activities (de Jonge et al., 2018;Fujii et al., 2005;Grau et al., 2018;Li et al., 2019;Sherkhane et al., 2017;Singh et al., 2012), metachelin C is a coprogen-type siderophore functioning in iron assimilation (Krasnoff et al., 2014), and apicidin is a conserved histone deacetylase inhibitor (Jin et al., 2010), ACE1 (a cytochalasan-related molecule) is presumed to promote appressorium-mediated penetration in Magnaporthe grisea (Collemare et al., 2008).Therefore, SMs produced by C. fructicola may have diverse roles in microbial pathogenicity, competition and other processes.It is worth noting that SM gene clusters related to ACE1, alternapyrone and cercosporin have also been found in C. higginsianum (Dallery et al., 2017).5b), among which two have been demonstrated to be required for full virulence of C. fructicola (Shang et al., 2020(Shang et al., , 2024)).

| Identification of lineage-specific genes critical for apple GLS pathogenesis
C. fructicola and C. aenigma are known to cause apple GLS disease in China (Chen et al., 2022).C. fructicola has a broad host range, causing damage to over 50 plant species.However, reported intraspecific pathogenic differentiation indicates that this fungus may comprise forms with distinct host preferences (Rockenbach et al., 2016).We  S16).In addition, subtelomeric regions were highly variable (Figures 6 and S16).At the gene level, 16,689 (88.28%) genes were designated as C. fructicola core genes (bases with nonzero coverage >95% in all compared C. fructicola isolates), whereas 896 (4.72%) genes were designated as variable (bases with non-zero coverage <80% in at least three isolates).Interestingly, two subregions of 1104-7 genome (scaffold S1, 3.12-3.74and 5.88-6.87Mb) were GLS specific, with a considerably greater coverage fractions for reads derived from GLS isolates than those from non-GLS isolates (Figures 6 and S16).These two regions, totalling 1.61 Mb in length, are referred to as GLS-R1 (GLS-Region 1) and GLS-R2 (GLS-Region 2).Association of GLS-R1 and GLS-R2 with GLS pathogenicity was also observed in C. aenigma.The two regions were specifically present in the two apple-derived GLS isolates (XY15, PC-WS-1) but not in the strawberry-derived isolate Cg56.GLS-R1 and GLS-R2's trans-species association with GLS pathogenicity is consistent with a function in GLS pathogenesis.
The GLS-R1 and GLS-R2 regions were highly repetitive (Figure S16).Four hundred and twenty-four repeat elements were identified within these two regions, covering 0.67 Mb total DNA (41.6%), a ratio that is significantly higher than the genome background (5.67%).A total of 331 protein-coding genes were predicted within GLS-R1 and GLS-R2 (Dataset S2).By performing hierarchical clustering, these genes were classified into GLS-specific genes (76 in total), variable genes (208 in total) and GLS-associated genes (49 in total) based on variation in the breadth of coverage among reads derived from 22 C. fructicola and C. aenigma isolates (Figure 7a).GLS-specific genes strongly correlated with GLS pathogenicity in reads presence-absence pattern, with bases with non-zero coverage >95% in all six GLS isolates, but <10% in all other isolates.GLS-associated genes were generally specific to the six GLS isolates but may be present in an additional one or two non-GLS isolates, or detected in non-GLS isolates with low breadth of coverage.The distribution of different gene groups within GLS-R1 and GLS-R2 was somewhat compartmentalized.For instance, two regions (3.14-3.45 and 6.31-6.79Mb) mainly contained GLS-specific genes (44.7%) and GLS-associated genes (31.1%) (Figure 7b).Functional enrichment analysis suggested that GLS-specific genes were enriched with functions related to secondary metabolism (e.g., AMP-binding enzyme, UDP-glucoronosyl and UDP-glucosyl transferase, chorismate binding enzyme) (Tables S6-S8).
To determine whether the identified GLS-R1 and GLS-R2 regions indeed contribute towards GLS pathogenicity, we chose 10 genes within the two regions (GLS pathogenicity candidate genes, GPCGs) for gene deletion analysis (Table 1).These genes were selected with a combined consideration of virulence-related gene function, infection-specific expressional up-regulation (based on previous RNA-seq data) and presence specificity among C. fructicola and C. aenigma GLS isolates.These 10 GPCGs included eight GLS-specific genes (GPCG 1,4,5,12,13,14,16,17), one GLSassociated gene (GPCG3, a putative β-ketoacyl synthase) and one variable gene (GPCG9, a putative fungal specific transcription factor).For each gene, two independent deletion mutants were created and used for phenotypic characterization.Figure S17 shows the PCR-based validation of gene deletion events for each gene.In virulence assay with detached apple leaves, deletion of any of the three GPCGs (1,16,17) completely abolished GLS lesion formation, whereas deletion of any of the other seven GPCGs (3,4,5,9,12,13,14) had no obvious effect (Figure 8a).The GLS virulence defects of GPCG1, GPCG16 and GPCG17 mutants were consistently observed in both leaf and fruit inoculation assays (Figure 8b,c).The mutant virulence defects for GPCG1 and GPCG16 were fully restored by genetic complementation.Genetic complementation was not attempted for GPCG17 as it encodes a very large protein (NRPS).To determine whether GPCG1, GPCG16 and GPCG17 contribute towards ABR pathogenicity, inoculation assays were performed with prewounded apple fruits, in which case ABR-mimicking rot lesion symptoms would be induced.The results demonstrated that deletion of any of the three genes did not affect rot lesion formation (Figure 8d).We concluded that GPCG1, GPCG16 and GPCG17 are key genes regulating GLS pathogenesis, but dispensable for ABR lesion induction on wounded fruit.
We characterized the functions of GPCG1, GPCG16 and GPCG17 in more detail.Reverse transcription-quantitative PCR (RT-qPCR) quantification of gene expression showed that the three genes had similar expression patterns, with expression levels being low in conidia and in vitro-grown vegetative hyphae, mildly induced in in vitro appressoria and during wounded fruit infection, but considerably higher during apple leaf infection (Figure S18).Gene expression of the three genes peaked at 60 h post-inoculation (hpi) during apple leaf infection, increased by roughly 4000-, 130-and 450-fold, respectively, compared to conidia.Deletion of GPCG1, GPCG16 or GPCG17 did not affect fungal colony growth on PDA, perithecial development or conidial germination (Figure S19a,b); more importantly, deletion of these genes did not affect appressorium differentiation or appressorium-mediated penetration on artificial cellophane membrane (Figure S19c).These results suggest that the three genes play post-penetration virulence-specific functions.To better understand at which infection stage the GPCG1, GPCG16 and GPCG17 genes are functional, we performed histological observations with apple leaf samples inoculated with different isolates (Figure S20).Deletion mutants of the three genes showed normal conidial germination and appressorium differentiation on apple leaf surface but were defective in the development of the primary infectious vesicle and infectious hyphae, suggesting that the mutants are blocked at the early postpenetration infection phase.
We conducted bioinformatics analysis with GPCG1, GPCG16 and GPCG17 to infer their potential biochemical functions and evolution (Figure S21a).The GPCG1 gene is within GLS-R1 and encodes a putative flavin-binding monooxygenase.Within GLS-R2, GPCG16 encodes a small protein (150 amino acids) with no predicted signal peptide or protein domains and GPCG17 encodes a putative nonribosomal peptide synthetase (NRPS).Interestingly, GPCG16 and GPCG17 are adjacent genes separated by 4048 bp.Possibly, GPCG1, GPCG16 and GPCG17, together with additional unidentified genes within GLS-R1 and GLS-R2, cooperatively control the biosynthesis of a secondary metabolite critical for early GLS pathogenesis.To learn about the evolution of GPCG1, GPCG16 and GPCG17 genes, we examined the distribution patterns of their homologues in NCBI nr database by performing a BlastP search.In line with the fact that the three genes are GLS-lineage specific, BlastP queries only identified distant homologues (30%-40% amino acid identities), further phylogenetic analysis confirmed the deep separation of GPCG1, GPCG16 and GPCG17 from their Colletotrichum and non-Colletotrichum homologues (Figure S21b).

| DISCUSS ION
Since its discovery in the United States in the 1970s, GLS disease has been reported primarily in North and South America, as well as East Asia (Liang et al., 2022;Velho et al., 2015).Despite the disease's recent origins and narrow geographic reach, pathogen species diversity is very high.To date, nine GLS species have been reported globally, and they are members of three phylogenetically distinct species complexes (CGSC, CASC and CBSC) (Liang et al., 2022).The short disease history, intraspecific pathogenic differentiation and polyphyletic distribution of GLS pathogen species indicate a multiple origin scenario of GLS pathogenicity that involves the horizontal transfer of a pathogenicity determinant(s).However, such an evolutionary possibility has not been investigated.In this study, we performed genome comparisons with C. fructicola and C. aenigma, and identified two accessory genomic regions (GLS-R1, GLS-R2) that were specifically conserved among GLS-pathogenic isolates in both species.Moreover, three genes (PCG1, PCG16, PCG17) within these two regions were found by gene deletion studies to be crucial for GLS pathogenicity.These results highlight the critical involvement of lineage-specific DNA in GLS pathogenicity evolution.
For filamentous fungal pathogens, various processes may drive the evolution of host adaptation, such as chromosomal rearrangement (de Jonge et al., 2013), presence or absence of dispensable minichromosome (Ma et al., 2010), horizontal gene transfer (McDonald et al., 2019;Zhao et al., 2014), gene gain or loss (Dhillon et al., 2015;Sharma et al., 2014;Zajac et al., 2021) and positive selection of genes (Kobmoo et al., 2018;Sperschneider et al., 2014).At the genome level, the need to strike a balance between the rapid evolution of virulence genes and the maintenance of housekeeping genes has driven the compartmentalization of pathogen genomes, with repeat-rich, fast-evolving, accessory genomic regions acting as a cradle for host-adaptive virulence gene evolution (Croll & McDonald, 2012;Dong et al., 2015;Frantzeskakis et al., 2019).GLS-R1 and GLS-R2 are repeat-rich and gene sparse, phylogenetic analysis of PCG1, PCG16 and PCG17 genes showed that they are deeply separated from their Colletotrichum and non-Colletotrichum homologues.We postulate that GLS-R1 and GLS-R2 represent horizontally transferred DNAs from an unidentified source.Both GLS-R1 and GLS-R2 localize to scaffold S1, the longest chromosome in 1104-7.In comparison with other C. fructicola genomes (LJ19, CF413, Nara_gc5), the GLS-R1 and GLS-R2 insertions are flanked by 6-bp long direct repeats (CCCTCA and TTTACT, respectively; Figure S22) and no transposon elements were found close to the synteny BPs.Interestingly, two DUF3435 integrase-like genes within GLS-specific regions (Cf1104nano2|08705 and Cf1104nano2|09336) located near (576 bp) the right synteny BP of GLS-R1 and near (450 bp) the left synteny BP of GLS-R2, respectively (data not shown).These characteristics suggest that there might be active mechanisms facilitating the interspecific transfer of GLS-R1 and GLS-R2.

Minichromosome-mediated genome compartmentalization
has been demonstrated to be important for virulence evolution in Colletotrichum fungi (Plaumann & Koch, 2020).For instance, intraspecific variation in the number and size of minichromosomes is linked to virulence variation in C. gloeosporioides (He et al., 1998;Masel et al., 1990), one of the two minichromosomes in C. higginsianum is required for full virulence against Arabidopsis thaliana (Plaumann et al., 2018), and in a cross between two strains of C. lentis, a potent minichromosome-associated quantitative trait locus (QTL) accounts for 85% of the virulence variability (Bhadauria et al., 2019).Moreover, a recent comparative genomic study with CGSC species highlights that telomeres and repeat-rich minichromosomes are enriched with virulence-related accessory genomic regions (Gan et al., 2021).In this study, we compared chromosomelevel genome assemblies from four CGSC species, C. gloeosporioides, C. siamense, C. aenigma and C. fructicola.While our comparisons highlight clear distinctions between core and minichromosomes in evolutionary speed, we did not find direct correlation between GLS pathogenicity differentiation and the presence-absence pattern of a specific minichromosome or minichromosome region.Yet, it is worth noting that core chromosome-located GLS-R1 and GLS-R2 strongly resembles CGSC minichromosomes for being repeat rich, gene sparse and enriched with lineage-specific genes.In the blast fungus Magnaporthe oryzae (syn.Pyricularia oryzae), structure rearrangements and segmental duplication of core chromosomes have been indicated to contribute to the emergence of minichromosomes and the reshuffling of virulence-related genes (e.g., effectors) (Langner et al., 2021;Peng et al., 2019).Further genomic comparison studies with additional chromosome-scale assemblies of GLS isolates would be helpful for elucidating the evolutionary history of GLS-R1 and GLS-R2.
Our genome comparisons also demonstrated the intraspecific presence-absence polymorphism of a minichromosome (1104-7 scaffold S12) within C. fructicola.This chromosome was present in 14 out of 19 C. fructicola isolates and a whole genome phylogram (Figure 6) revealed that the five S12-lacking isolates belonged to three polyphyletic clades.S12 appears to be genetically unstable and experiences frequent losses in nature.In Verticillium longisporum, partial or complete deletion of a 20-kb lineage-specific DNA region increases virulence, suggesting that lineage-specific DNA can confer a virulence-attenuating function (Harting et al., 2021).The functional implications of the observed minichromosome loss event in C. fructicola remain to be investigated in the future.
Conidia of gene deletion mutants of PCG1, PCG16 and PCG17 germinated and differentiated appressoria normally on inoculated apple leaf surface.However, post-invasive infectious development of these mutants was very constrained.In a separate assay, these mutants' in vitro appressoria were able to penetrate a cellophane artificial membrane with an efficiency comparable to the wild type.We concluded that PCG1, PCG16 and PCG17 play virulence functions by promoting early infection, perhaps by inhibiting plant F I G U R E 6 DNA presence-absence polymorphism along scaffold S1, S11 and S12 in 1104-7 among Colletotrichum fructicola and C. aenigma isolates.Two Glomerella leaf spot (GLS)-specific regions (GLS-R1 and GLS-R2) on scaffold S1 are indicated by dash boxes and scaffold S11 and S12 are two putative minichromosomes.Relationship of the isolates is based on genome-wide single-nucleotide polymorphism (SNP) phylogram in Figure S15, isolates shaded in the same colour belong to the same clade.It is worth noting that such a phylogram-based approach only provides a rough estimation of intraspecific relationships due to potential assumption violation.Numbers separated by slash on the right of isolates indicate the number of leaves showing GLS symptoms and the total inoculated leaves, NA indicates not assessed.For each isolate, DNA sequence read coverage against 1104-7 (10 kb sliding window) is presented as histograms.Histograms of gene density (Gene), repeat element density (RE) and GC content (GC) for the 1104-7 reference scaffolds are presented at the bottom.Note that high reads coverage for GLS-R1 and GLS-R2 are specifically observed among GLS-pathogenic isolates (orange), and low reads coverage along S12 was observed in five C. fructicola isolates (filled black circle), indicating intraspecific dispensability of this minichromosome.
defence responses.In line with this hypothesis, PCG1, PCG16 and PCG17 all had in planta-specific gene expression patterns.
Functionally, PCG1 encodes a putative flavin-binding monooxygenase, PCG17 encodes a putative NRPS and PCG16 encodes an unknown protein but neighbours PCG17.These findings suggest that these three genes belong to a gene group that cooperates to catalyse the biosynthesis of a secondary metabolite important for GLS pathogenicity.PCG1, PCG16 and PCG17 lacked closely related homologues both inside and outside of the Colletotrichum genus, complicating the analysis of their evolutionary origin(s).A genetic mapping study has demonstrated a single recessive locus controlling apple GLS resistance (Liu et al., 2017).The presence of a pathogenicity-determining secondary metabolite would therefore be in line with an inverse gene-for-gene interaction model where host-pathogen recognition mediates virulence and disease compatibility.Such a genetic model is similar to the role of hostselective toxins (HSTs) reported for necrotrophic fungi (Wolpert et al., 2002), except that the GLS-associated metabolite functions as a plant defence suppressor.The in planta-specific expression characteristics of PCG1, PCG16 and PCG17 has added difficulty to metabolite isolation and characterization.In the future, screening for genetic regulators or environmental conditions that permit derepressed expression of PCG genes under in vitro conditions would be important for dissecting the function of the GLSdetermining metabolite.
In summary, we have generated chromosome-level genome assemblies for two C. fructicola isolates and identified a conserved bipartite genome architecture involving minichromosomes among several CGSC species.By performing genome resequencing and comparison, we identified two GLS-specific genomic regions and identified three genes within the regions to be critical for GLS pathogenesis.These results shed important insights F I G U R E 7 Identification of Glomerella leaf spot (GLS)-specific genes within the two GLS-specific subgenomic regions (GLS-R1 and GLS-R2).(a) Clustering of 333 predicted protein-coding genes into GLS-specific genes, variable genes and others based on variation in the breadth of read coverage (fraction of bases with non-zero coverage) among 22 Colletotrichum fructicola and C. aenigma isolates.Higher heatmap scales indicate higher read coverage breadth.Isolate names are coloured based on GLS pathogenicity assay result.Orange: GLSpathogenic, light blue: GLS-nonpathogenic, grey: not assessed but putatively GLS-nonpathogenic.(b) Circos plot showing the distribution of the three gene types along chromosome.Track 1, gene category information.Orange: GLS-specific genes, black: variable genes, blue: others.Tracks 2-5, breadth of read coverage for PGYGH01 (C.fructicola, GLS-pathogenic), LC03680 (C.fructicola, GLS-nonpathogenic), XY15 (C.aenigma, GLS-nonpathogenic) and Cg56 (C.aenigma, putatively GLS-nonpathogenic), respectively.Inner links represent major repeat elements considering both unit length and copy number.
TA B L E 1 Predicted functions of Glomerella leaf spot (GLS) pathogenicity candidate genes (GPCGs) chosen for gene deletion analysis.

F
Assembly of Colletotrichum fructicola 1104-7 and LJ19 genomes from nanopore long reads.(a) Genome-wide Hi-C contact map of 1104-7.Putative centromeres are indicated by cross-like patterns.(b) Dot plot showing the genome alignment between 1104-7 and LJ19.Matches were identified using nucmer in Mummer, forward matches are in red and reverse matches are in blue, only highly similar matches (DNA identity >99%, length >10 kb) are shown.(c) 1104-7 genome queried through BlastN with LJ19.The 1104-7 genomic regions lacking matches in LJ19 are highlighted in brown and slide window of 10 kb GC content is shown in blue.Matches were identified by BlastN search with a stringency cut-off of DNA identity >99% and length >10 kb.Note the large GC content drop near putative centromeres.(d) Lineage-specific regions of LJ19 genome queried through BlastN with 1104-7.2.3 | Minichromosome-driven two-speed genome evolution in CGSC The C. gloeosporioides species complex (CGSC) is made up of more than 20 closely related species.However, the complex's genomic evolution has largely gone unexplored.Prior to this study, five longread based chromosome-level genome assemblies have been established within CGSC.Here, we analysed macrosynteny conservation among 1104-7, LJ19 and the other five CGSC genomes.These seven genomes represent four CGSC species, namely C. fructicola (1104-7, LJ19, CF413, Nara_gc5), C. aenigma (Cg56), C. siamense (Cg363) and C. gloeosporioides (SMCG1#C)(Gan et al., 2021;Huang et al., 2019).
Illumina reads mapping failed to detect this chromosome in an additional 4 out of 15 C. fructicola isolates (Figure6), indicating a plastic presence-absence polymorphism of MLSs within C. fructicola.
As hemibiotrophs, Colletotrichum species require the dynamic expression of a variety of virulence factors including plant cell F I G U R E 3 Schematic representation of identified intraspecific, isolate-specific chromosomal rearrangement events (inversions, INV1, 4, 5; translocations, TRA1, 2, 3) on core chromosomes of Colletotrichum fructicola based on comparison of four high-quality genomes (1104-7, LJ19, Nara_gc5, CF413).For INVs (left), yellow boxes with red arrows indicate putative transposon elements.Purple and cyan boxes in INV3 indicate variable DNAs.Detailed descriptions of DNA rearrangement events for INV1-5 and TRA1-3 are listed in Figures S7 and Table wall-degrading enzymes (PCWDEs), transporters, secondary metabolite (SM) synthetases and secretory effectors(Kleemann et al., 2012;O'Connell et al., 2012).Here, we annotated candidate SM synthetases and candidate effectors within the high-quality C. fructicola 1104-7 genome.Eighty putative SM biosynthesis enzymes (17 DMATs, 15 NRPSs, 5 NRPS-PKS hybrids, 37 PKSs, 6 TSs) were predicted by SMIPS analysis and terpene synthase domain (PF03936) search.These genes were dispersed over scaffolds S1 to S10.Ten putative SM enzymes (12.3%) were found within 200 kb regions of the scaffold ends.Among the four compared CGSC species (C.gloeosporioides, C. siamense, C. aenigma, C. fructicola), 18 (22.2%) of the predicted SM enzymes displayed presence-absence polymorphism, among which one putative NRPS (Cf1104nano2|13135) was specific to C. fructicola.AntiSMASH prediction, homology and synteny analyses revealed 10 SM gene clusters that were strongly syntenic to fungal SM clusters producing known metabolites (Figure 4, Table ) predicted 1490 putative secretory proteins in 1104-7, 631 of which were classified as small secretory proteins (SSPs) because they contained fewer than 300 amino acids.Based on OrthoFinder clustering and a local BlastP search against 29 ascomycete genomes (11 non-Colletotrichum and 19 Colletotrichum ones), 331 SSPs were classified into a Colletotrichum genus-specific group (Figure 5a).Based on a Nicotiana benthamiana transient expression system, genus-specific SSPs were further characterized for their cell death-suppressive or -promoting activities.Of the 50 screened members, six could suppress BAX-induced cell death (Figure resequenced 15 C. fructicola isolates and two C. aenigma isolates from different hosts and geographic origins using Illumina technology (Dataset S1) to analyse possible genomic variation associated with GLS pathogenicity evolution.These 17 isolates, as well as 1104-7 and LJ19, were subjected to an artificial inoculation assay (Figures6 and S15), which demonstrated that all four C. fructicola isolates obtained from apple GLS lesions consistently exhibited GLS symptoms on Gala apple leaves, whereas the remaining 13 C. fructicola isolates from apple fruit or non-apple hosts did not.The two C. aenigma isolates derived from GLS lesions (XY15, PC-WS-1) were also pathogenic on Gala apple leaves.A phylogenetic tree built with 21,741 parsimony-informative single-nucleotide polymorphism (SNP) sites located within 1326 core fungal genes (single-copy core genes derived from BUSCO analysis) separated C. fructicola isolates into four groups, with all four GLS-pathogenic isolates belonging to the same group (Figures6 and S15).C. aenigma isolates were well separated from C. fructicola isolates (Figures 6 and S15), confirming their species separation.Genomic regions and genes showing variations among C. fructicola and C. aenigma isolates were identified by mapping Illumina reads or PacBio reads from different isolates against the 1104-7 reference genome.Table S5 summarizes the genome resequencing and read mapping statistics for individual isolates.The average reads depths ranged between 18.5 and 205.4.As already mentioned, one minichromosome (scaffold S12) exhibits presenceabsence polymorphism among the compared C. fructicola isolates (Figures 6 and

F
Expression, presence-absence polymorphism and functional characterization of a set of genus-specific effector candidate (EC) genes within C. fructicola.(a) Heatmap showing the expression profiles of ECs (left) and their presence-absence polymorphism patterns among Colletotrichum genomes (right).The transcriptomic heatmap is based on an RNA-seq dataset previously published (Liang, Shang, et al., 2018) and the presence-absence heatmap is based on BlastP queries of C. fructicola ECs against other Colletotrichum genomes.ECs with cell death-suppressive activities (shown in panel b) are coloured in blue.(b) ECs with cell death-suppressive activity based on a Nicotiana benthamiana transient co-expression assay system.Top, the infiltration scheme, leaves were infiltrated with Agrobacterium tumefaciens carrying pGR107 empty vector (EV) or EV inserted with the EC gene (EC) alone or in a mixture with A. tumefaciens expressing the cell deathpromoting protein BAX (EV + BAX or EC + BAX).Bottom, identified ECs with death-suppressive activity.For each EC, the number of leaves showing death-suppressive activity and the total infiltrated leaves are indicated in parentheses.
a represents a manual combination of three neighbouring genes (Cf1104nano2|09529, Cf1104nano2|09530 and Cf1104nano2|09531) that harbour protein domains related to a nonribosomal peptide synthetase (NRPS).Manual prediction of the putative coding regions and protein-encoding sequence is shown in Dataset S3.into adaptative evolution in Colletotrichum fungi and lay a foundation for further dissection of the pathogenicity mechanisms of GLS pathogens.4| E XPERIMENTAL PROCEDURE S4.1 | Fungal isolatesDetailed information of all C. fructicola and C. aenigma isolates used in this study can be found in Dataset S1.Isolates were cultured on PDA and preserved as 15% glycerol conidial stocks at −80°C in the Fungal Laboratory, College of Plant Protection, Procedure for genomic DNA extraction and nanopore sequencing of 1104-7 has been described previously(Liang et al., 2020), the same pipeline has been applied for LJ19 to obtain nanopore reads.The obtained nanopore reads for 1104-7 and LJ19 were assembled with the software NextDenovo (https:// github.com/ Nexto mics/ NextD enovo ).

F
Virulence phenotypes of gene deletion mutants of the 10 Glomerella leaf spot (GLS) pathogenicity candidate genes (GPCGs).(a) Lesion symptom appearance on detached Gala apple leaves at 5 days after conidial drop inoculation (10 7 spores/mL).Wild-type (WT) and individual gene deletion mutants (KO) were pair inoculated on individual leaflets, each with two replicates.The number within parentheses represents the KO strain designation.(b) Genetic complementation of the GPCG1 and GPCG16 KO mutants.Conidial suspensions (10 7 /mL) were either drop inoculated (left) or spray inoculated (right).K, KO strain.C, complementation strain.(c) Lesion appearance on Gala apple fruit at 15 days after conidial spray inoculation (10 7 /mL).(d) Lesion appearance on Gala apple fruit at 5 days after conidial drop inoculation (10 7 /mL) at prewounded sites.